Model Selection

Multimodal video retrieval

# Multimodal video retrieval

GME VARCO VISION Embedding

GME-VARCO-VISION-Embedding is a multimodal embedding model that focuses on calculating the semantic similarity between text, images, and videos in a high-dimensional embedding space, and is particularly good at video retrieval tasks.

Multimodal Fusion

Transformers English

Featured Recommended AI Models

AIbase

Empowering the Future, Your AI Solution Knowledge Base

English 简体中文繁體中文にほんご

© 2025AIbase